Subgroup Discovery for Defect Prediction
نویسندگان
چکیده
Although there is extensive literature in software defect prediction techniques, machine learning approaches have yet to be fully explored and in particular, Subgroup Discovery (SD) techniques. SD algorithms aim to find subgroups of data that are statistically different given a property of interest [1,2]. SD lies between predictive (finding rules given historical data and a property of interest) and descriptive tasks (discovering interesting patterns in data). An important difference with classification tasks is that the SD algorithms only focus on finding subgroups (e.g., inducing rules) for the property of interest and do not necessarily describe all instances in the dataset. In this preliminary study, we have compared two well-known algorithms, the Subgroup Discovery algorithm [3] and CN2-SD algorithm [4], by applying them to several datasets from the publicly available PROMISE repository [5], as well as the Bug Prediction Dataset created by D’Ambros et al. [6]. The comparison is performed using quality measures adapted from classification measures. The results show that generated models can be used to guide testing effort. The parameters for the SD algorithms can be adjusted to balance the specificity and generality of a rule so that the selected rules can be considered good enough for software engineering standards. The induced rules are simple to use and easy to understand. Further work with more datasets and other SD algorithms that tackle the discovery of subgroups using different approaches (e.g., continuous attributes, discretization, quality measures, etc.) is needed.
منابع مشابه
A study of subgroup discovery approaches for defect prediction
Context: Although many papers have been published on software defect prediction techniques, machine learning approaches have yet to be fully explored. Objective: In this paper we suggest using a descriptive approach for defect prediction rather than the precise classification techniques that are usually adopted. This allows us to characterise defective modules with simple rules that can easily ...
متن کاملSearching for rules to detect defective modules: A subgroup discovery approach
Data mining methods in software engineering are becoming increasingly important as they can support several aspects of the software development life-cycle such as quality. In this work, we present a data mining approach to induce rules extracted from static software metrics characterising fault-prone modules. Due to the special characteristics of the defect prediction data (imbalanced, inconsis...
متن کاملSong, H., & Flach, P. (2015). Model Reuse with Subgroup Discovery. In Proceedings of the ECML/PKDD 2015 Discovery Challenges: co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2015) (CEUR
In this paper we describe a method to reuse models with Model-Based Subgroup Discovery (MBSD), which is a extension of the Subgroup Discovery scheme. The task is to predict the number of bikes at a new rental station 3 hours in advance. Instead of training new models with the limited data from these new stations, our approach first selects a number of pre-trained models from old rental stations...
متن کاملRSD: Relational Subgroup Discovery through First-Order Feature Construction
Relational rule learning is typically used in solving classification and prediction tasks. However, relational rule learning can be adapted also to subgroup discovery. This paper proposes a propositionalization approach to relational subgroup discovery, achieved through appropriately adapting rule learning and first-order feature construction. The proposed approach, applicable to subgroup disco...
متن کاملUsing constraints in relational subgroup discovery
Relational rule learning is typically used in solving classification and prediction tasks. However, it can also be adapted to the description task of subgroup discovery. This paper takes a propositionalization approach to relational subgroup discovery (RSD), based on adapting rule learning and first-order feature construction, applicable in individualcentered domains. It focuses on the use of c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011